Sensitive Discount Optimality: Unifying Discounted and Average Reward Reinforcement Learning

نویسنده

  • Sridhar Mahadevan
چکیده

Research in reinforcement learning (RL) has thus far concentrated on two optimality criteria: the discounted framework, which has been very well-studied, and the average-reward framework, in which interest is rapidly increasing. In this paper, we present a framework called sensitive discount optimality which ooers an elegant way of linking these two paradigms. Although sensitive discount optimality has been well studied in dynamic programming, with several provably convergent algorithms, it has not received any attention in RL. This framework is based on studying the properties of the expected cumulative discounted reward, as discounting tends to 1. Under these conditions, the cumulative discounted reward can be expanded using a Laurent series expansion to yields a sequence of terms, the rst of which is the average reward, the second involves the average adjusted sum of rewards (or bias), etc. We use the sensitive discount optimality framework to derive a new model-free average reward technique, which is related to Q-learning type methods proposed by Bertsekas, Schwartz, and Singh, but which unlike these previous methods, optimizes both the rst and second terms in the Laurent series (average reward and bias values). Statement: This paper has not been submitted to any other conference.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results Editor: Leslie Kaelbling

This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dynamic programming methods to several (provably convergent) asyn-chronous algorithms from optimal co...

متن کامل

Manufactured in The Netherlands . Average Reward Reinforcement Learning : Foundations , Algorithms , and Empirical

This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dynamic programming methods to several (provably convergent) asyn-chronous algorithms from optimal co...

متن کامل

Blackwell Optimality for Controlled Diffusion Processes

In this paper we study m-discount optimality (m ≥ −1) and Blackwell optimality for a general class of controlled (Markov) diffusion processes. To this end, a key step is to express the expected discounted reward function as a Laurent series, and then search certain control policies that lexicographically maximize themth coefficient of this series form = −1, 0, 1, . . . .This approach naturally ...

متن کامل

Hierarchically Optimal Average Reward Reinforcement Learning

Two notions of optimality have been explored in previous work on hierarchical reinforcement learning (HRL): hierarchical optimality, or the optimal policy in the space defined by a task hierarchy, and a weaker local model called recursive optimality. In this paper, we introduce two new average-reward HRL algorithms for finding hierarchically optimal policies. We compare them to our previously r...

متن کامل

To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning

Most work in reinforcement learning (RL) is based on discounted techniques, such as Q learning, where long-term rewards are geometrically attenuated based on the delay in their occurence. Schwartz recently proposed an undiscounted RL technique called R learning that optimizes average reward, and argued that it was a better metric than the discounted one optimized by Q learning. In this paper we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996